64 research outputs found

    Algorithmic encoding of protected characteristics in chest X-ray disease detection models

    Get PDF
    Background It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. An algorithm may encode protected characteristics, and then use this information for making predictions due to undesirable correlations in the (historical) training data. It remains unclear how we can establish whether such information is actually used. Besides the scarcity of data from underserved populations, very little is known about how dataset biases manifest in predictive models and how this may result in disparate performance. This article aims to shed some light on these issues by exploring methodology for subgroup analysis in image-based disease detection models. Methods We utilize two publicly available chest X-ray datasets, CheXpert and MIMIC-CXR, to study performance disparities across race and biological sex in deep learning models. We explore test set resampling, transfer learning, multitask learning, and model inspection to assess the relationship between the encoding of protected characteristics and disease detection performance across subgroups. Findings We confirm subgroup disparities in terms of shifted true and false positive rates which are partially removed after correcting for population and prevalence shifts in the test sets. We find that transfer learning alone is insufficient for establishing whether specific patient information is used for making predictions. The proposed combination of test-set resampling, multitask learning, and model inspection reveals valuable insights about the way protected characteristics are encoded in the feature representations of deep neural networks. Interpretation Subgroup analysis is key for identifying performance disparities of AI models, but statistical differences across subgroups need to be taken into account when analyzing potential biases in disease detection. The proposed methodology provides a comprehensive framework for subgroup analysis enabling further research into the underlying causes of disparities. Funding European Research Council Horizon 2020, UK Research and Innovation

    Automatic localisation and per-region quantification of traumatic brain injury on head CT using atlas mapping

    Get PDF
    Rationale and objectives To develop a method for automatic localisation of brain lesions on head CT, suitable for both population-level analysis and lesion management in a clinical setting. Materials and methods Lesions were located by mapping a bespoke CT brain atlas to the patient’s head CT in which lesions had been previously segmented. The atlas mapping was achieved through robust intensity-based registration enabling the calculation of per-region lesion volumes. Quality control (QC) metrics were derived for automatic detection of failure cases. The CT brain template was built using 182 non-lesioned CT scans and an iterative template construction strategy. Individual brain regions in the CT template were defined via non-linear registration of an existing MRI-based brain atlas. Evaluation was performed on a multi-centre traumatic brain injury dataset (TBI) (n = 839 scans), including visual inspection by a trained expert. Two population-level analyses are presented as proof-of-concept: a spatial assessment of lesion prevalence, and an exploration of the distribution of lesion volume per brain region, stratified by clinical outcome. Results 95.7% of the lesion localisation results were rated by a trained expert as suitable for approximate anatomical correspondence between lesions and brain regions, and 72.5% for more quantitatively accurate estimates of regional lesion load. The classification performance of the automatic QC showed an AUC of 0.84 when compared to binarised visual inspection scores. The localisation method has been integrated into the publicly available Brain Lesion Analysis and Segmentation Tool for CT (BLAST-CT). Conclusion Automatic lesion localisation with reliable QC metrics is feasible and can be used for patient-level quantitative analysis of TBI, as well as for large-scale population analysis due to its computational efficiency (<2 min/scan on GPU)

    Development of machine learning support for reading whole body diffusion-weighted MRI (WB-MRI) in myeloma for the detection and quantification of the extent of disease before and after treatment (MALIMAR): protocol for a cross-sectional diagnostic test accuracy study.

    Get PDF
    INTRODUCTION: Whole-body MRI (WB-MRI) is recommended by the National Institute of Clinical Excellence as the first-line imaging tool for diagnosis of multiple myeloma. Reporting WB-MRI scans requires expertise to interpret and can be challenging for radiologists who need to meet rapid turn-around requirements. Automated computational tools based on machine learning (ML) could assist the radiologist in terms of sensitivity and reading speed and would facilitate improved accuracy, productivity and cost-effectiveness. The MALIMAR study aims to develop and validate a ML algorithm to increase the diagnostic accuracy and reading speed of radiological interpretation of WB-MRI compared with standard methods. METHODS AND ANALYSIS: This phase II/III imaging trial will perform retrospective analysis of previously obtained clinical radiology MRI scans and scans from healthy volunteers obtained prospectively to implement training and validation of an ML algorithm. The study will comprise three project phases using approximately 633 scans to (1) train the ML algorithm to identify active disease, (2) clinically validate the ML algorithm and (3) determine change in disease status following treatment via a quantification of burden of disease in patients with myeloma. Phase 1 will primarily train the ML algorithm to detect active myeloma against an expert assessment ('reference standard'). Phase 2 will use the ML output in the setting of radiology reader study to assess the difference in sensitivity when using ML-assisted reading or human-alone reading. Phase 3 will assess the agreement between experienced readers (with and without ML) and the reference standard in scoring both overall burden of disease before and after treatment, and response. ETHICS AND DISSEMINATION: MALIMAR has ethical approval from South Central-Oxford C Research Ethics Committee (REC Reference: 17/SC/0630). IRAS Project ID: 233501. CPMS Portfolio adoption (CPMS ID: 36766). Participants gave informed consent to participate in the study before taking part. MALIMAR is funded by National Institute for Healthcare Research Efficacy and Mechanism Evaluation funding (NIHR EME Project ID: 16/68/34). Findings will be made available through peer-reviewed publications and conference dissemination. TRIAL REGISTRATION NUMBER: NCT03574454

    Learning to segment when experts disagree

    Get PDF
    Recent years have seen an increasing use of supervised learning methods for segmentation tasks. However, the predictive performance of these algorithms depend on the quality of labels, especially in medical image domain, where both the annotation cost and inter-observer variability are high. In a typical annotation collection process, different clinical experts provide their estimates of the “true” segmentation labels under the influence of their levels of expertise and biases. Treating these noisy labels blindly as the ground truth can adversely affect the performance of supervised segmentation models. In this work, we present a neural network architecture for jointly learning, from noisy observations alone, both the reliability of individual annotators and the true segmentation label distributions. The separation of the annotators’ characteristics and true segmentation label is achieved by encouraging the estimated annotators to be maximally unreliable while achieving high fidelity with the training data. Our method can also be viewed as a translation of STAPLE, an established label aggregation framework proposed in Warfield et al. [1] to the supervised learning paradigm. We demonstrate first on a generic segmentation task using MNIST data and then adapt for usage with MRI scans of multiple sclerosis (MS) patients for lesion labelling. Our method shows considerable improvement over the relevant baselines on both datasets in terms of segmentation accuracy and estimation of annotator reliability, particularly when only a single label is available per image. An open-source implementation of our approach can be found at https://github.com/UCLBrain/MSLS

    Prediction of Thrombectomy Functional Outcomes using Multimodal Data

    Full text link
    Recent randomised clinical trials have shown that patients with ischaemic stroke {due to occlusion of a large intracranial blood vessel} benefit from endovascular thrombectomy. However, predicting outcome of treatment in an individual patient remains a challenge. We propose a novel deep learning approach to directly exploit multimodal data (clinical metadata information, imaging data, and imaging biomarkers extracted from images) to estimate the success of endovascular treatment. We incorporate an attention mechanism in our architecture to model global feature inter-dependencies, both channel-wise and spatially. We perform comparative experiments using unimodal and multimodal data, to predict functional outcome (modified Rankin Scale score, mRS) and achieve 0.75 AUC for dichotomised mRS scores and 0.35 classification accuracy for individual mRS scores.Comment: Accepted at Medical Image Understanding and Analysis (MIUA) 202

    Presurgical diffusion metrics of the thalamus and thalamic nuclei in postoperative delirium: a prospective two-centre cohort study in older patients

    Get PDF
    BACKGROUND: The thalamus seems to be important in the development of postoperative delirium (POD) as previously revealed by volumetric and diffusion magnetic resonance imaging. In this observational cohort study, we aimed to further investigate the impact of the microstructural integrity of the thalamus and thalamic nuclei on the incidence of POD by applying diffusion kurtosis imaging (DKI). METHODS: Older patients without dementia (=65 years) who were scheduled for major elective surgery received preoperative DKI at two study centres. The DKI metrics fractional anisotropy (FA), mean diffusivity (MD), mean kurtosis (MK) and free water (FW) were calculated for the thalamus and - as secondary outcome - for eight predefined thalamic nuclei and regions. Low FA and MK and, conversely, high MD and FW, indicate aspects of microstructural abnormality. To assess patients' POD status, the Nursing Delirium Screening Scale (Nu-DESC), Richmond Agitation Sedation Scale (RASS), Confusion Assessment Method (CAM) and Confusion Assessment Method for the Intensive Care Unit score (CAM-ICU) and chart review were applied twice a day after surgery for the duration of seven days or until discharge. For each metric and each nucleus, logistic regression was performed to assess the risk of POD. RESULTS: This analysis included the diffusion scans of 325 patients, of whom 53 (16.3 %) developed POD. Independently of age, sex and study centre, thalamic MD was statistically significantly associated with POD [OR 1.65 per SD increment (95 %CI 1.17 - 2.34) p = 0.004]. FA (p = 0.84), MK (p = 0.41) and FW (p = 0.06) were not significantly associated with POD in the examined sample. Exploration of thalamic nuclei also indicated that only the MD in certain areas of the thalamus was associated with POD. MD was increased in bilateral hemispheres, pulvinar nuclei, mediodorsal nuclei and the left anterior nucleus. CONCLUSIONS: Microstructural abnormalities of the thalamus and thalamic nuclei, as reflected by increased MD, appear to predispose to POD. These findings affirm the thalamus as a region of interest in POD research

    ISLES 2016 and 2017-Benchmarking ischemic stroke lesion outcome prediction based on multispectral MRI

    Get PDF
    Performance of models highly depend not only on the used algorithm but also the data set it was applied to. This makes the comparison of newly developed tools to previously published approaches difficult. Either researchers need to implement others' algorithms first, to establish an adequate benchmark on their data, or a direct comparison of new and old techniques is infeasible. The Ischemic Stroke Lesion Segmentation (ISLES) challenge, which has ran now consecutively for 3 years, aims to address this problem of comparability. ISLES 2016 and 2017 focused on lesion outcome prediction after ischemic stroke: By providing a uniformly pre-processed data set, researchers from all over the world could apply their algorithm directly. A total of nine teams participated in ISLES 2015, and 15 teams participated in ISLES 2016. Their performance was evaluated in a fair and transparent way to identify the state-of-the-art among all submissions. Top ranked teams almost always employed deep learning tools, which were predominately convolutional neural networks (CNNs). Despite the great efforts, lesion outcome prediction persists challenging. The annotated data set remains publicly available and new approaches can be compared directly via the online evaluation system, serving as a continuing benchmark (www.isles-challenge.org).Fundacao para a Ciencia e Tecnologia (FCT), Portugal (scholarship number PD/BD/113968/2015). FCT with the UID/EEA/04436/2013, by FEDER funds through COMPETE 2020, POCI-01-0145-FEDER-006941. NIH Blueprint for Neuroscience Research (T90DA022759/R90DA023427) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under award number 5T32EB1680. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. PAC-PRECISE-LISBOA-01-0145-FEDER-016394. FEDER-POR Lisboa 2020-Programa Operacional Regional de Lisboa PORTUGAL 2020 and Fundacao para a Ciencia e a Tecnologia. GPU computing resources provided by the MGH and BWH Center for Clinical Data Science Graduate School for Computing in Medicine and Life Sciences funded by Germany's Excellence Initiative [DFG GSC 235/2]. National Research National Research Foundation of Korea (NRF) MSIT, NRF-2016R1C1B1012002, MSIT, No. 2014R1A4A1007895, NRF-2017R1A2B4008956 Swiss National Science Foundation-DACH 320030L_163363

    Post-acute blood biomarkers and disease progression in traumatic brain injury

    Get PDF
    There is substantial interest in the potential for traumatic brain injury to result in progressive neurological deterioration. While blood biomarkers such as glial fibrillary acid protein (GFAP) and neurofilament light have been widely explored in characterizing acute traumatic brain injury (TBI), their use in the chronic phase is limited. Given increasing evidence that these proteins may be markers of ongoing neurodegeneration in a range of diseases, we examined their relationship to imaging changes and functional outcome in the months to years following TBI.Two-hundred and three patients were recruited in two separate cohorts; 6 months post-injury (n = 165); and >5 years post-injury (n = 38; 12 of whom also provided data ∼8 months post-TBI). Subjects underwent blood biomarker sampling (n = 199) and MRI (n = 172; including diffusion tensor imaging). Data from patient cohorts were compared to 59 healthy volunteers and 21 non-brain injury trauma controls. Mean diffusivity and fractional anisotropy were calculated in cortical grey matter, deep grey matter and whole brain white matter. Accelerated brain ageing was calculated at a whole brain level as the predicted age difference defined using T1-weighted images, and at a voxel-based level as the annualized Jacobian determinants in white matter and grey matter, referenced to a population of 652 healthy control subjects.Serum neurofilament light concentrations were elevated in the early chronic phase. While GFAP values were within the normal range at ∼8 months, many patients showed a secondary and temporally distinct elevations up to >5 years after injury. Biomarker elevation at 6 months was significantly related to metrics of microstructural injury on diffusion tensor imaging. Biomarker levels at ∼8 months predicted white matter volume loss at >5 years, and annualized brain volume loss between ∼8 months and 5 years. Patients who worsened functionally between ∼8 months and >5 years showed higher than predicted brain age and elevated neurofilament light levels.GFAP and neurofilament light levels can remain elevated months to years after TBI, and show distinct temporal profiles. These elevations correlate closely with microstructural injury in both grey and white matter on contemporaneous quantitative diffusion tensor imaging. Neurofilament light elevations at ∼8 months may predict ongoing white matter and brain volume loss over >5 years of follow-up. If confirmed, these findings suggest that blood biomarker levels at late time points could be used to identify TBI survivors who are at high risk of progressive neurological damage.</p

    ISLES 2015 - A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI

    Get PDF
    Ischemic stroke is the most common cerebrovascular disease, and its diagnosis, treatment, and study relies on non-invasive imaging. Algorithms for stroke lesion segmentation from magnetic resonance imaging (MRI) volumes are intensely researched, but the reported results are largely incomparable due to different datasets and evaluation schemes. We approached this urgent problem of comparability with the Ischemic Stroke Lesion Segmentation (ISLES) challenge organized in conjunction with the MICCAI 2015 conference. In this paper we propose a common evaluation framework, describe the publicly available datasets, and present the results of the two sub-challenges: Sub-Acute Stroke Lesion Segmentation (SISS) and Stroke Perfusion Estimation (SPES). A total of 16 research groups participated with a wide range of state-of-the-art automatic segmentation algorithms. A thorough analysis of the obtained data enables a critical evaluation of the current state-of-the-art, recommendations for further developments, and the identification of remaining challenges. The segmentation of acute perfusion lesions addressed in SPES was found to be feasible. However, algorithms applied to sub-acute lesion segmentation in SISS still lack accuracy. Overall, no algorithmic characteristic of any method was found to perform superior to the others. Instead, the characteristics of stroke lesion appearances, their evolution, and the observed challenges should be studied in detail. The annotated ISLES image datasets continue to be publicly available through an online evaluation system to serve as an ongoing benchmarking resource (www.isles-challenge.org).Peer reviewe
    corecore